A CC-NUMA Prototype Card for SCI-Based PC Clustering

نویسندگان

  • Sang-Hwa Chung
  • Soo-Cheol Oh
  • Sejin Park
  • Hankook Jang
  • Chi-Jung Ha
چکیده

It is extremely important to minimize network access time in constructing a high-performance PC cluster system. For an SCI-based PC cluster, it is possible to reduce the network access time by maintaining network cache in each cluster node. This paper presents a CCNUMA card that utilizes network cache for SCI-based PC clustering. The CC-NUMA card is directly plugged into the PCI slot of each node, and contains shared memory, network cache, and interconnection modules. The network cache is maintained for the shared memory on the PCI bus of cluster nodes. The coherency mechanism between the network cache and the shared memory is based on the IEEE SCI standard. A CC-NUMA prototype card is developed to evaluate the performance of the system. According to the experiments, the cluster system with the CC-NUMA card showed considerable improvements compared with an SCI-based cluster without network cache. 1. The CC-NUMA based PC cluster system Figure 1 shows the structure of the CC-NUMA based PC cluster system. As shown in the figure, the CCNUMA card is designed as a plug-in card to the PCI slot of each cluster node, and contains shared memory control module, shared memory, network cache, network control module, and SCI directory. The CC-NUMA card is located on the PCI bus because the motherboards developed for PCs are not provided with system bus interface. Since the CC-NUMA card on the PCI bus cannot snoop the system bus, it is not possible to share the local memory space on the system bus with other cluster nodes. Thus the shared memory is also located on the PCI bus. The shared memory space in each node is assigned a unique PCI global address. Each SCI directory maintains sharing lists based on the SCI protocol for the shared memory. The shared memory control module is based on the SCI cache coherence algorithm, and processes read/write transactions to the shared memory delivered from local or remote CPU. For a remote read/write transaction made by the local CPU, the network cache is consulted first. If the requested data is not available in the network cache, the transaction is transferred to the network control module. The network control module manages the sharing lists in the SCI directory, and handles read/write transactions and invalidation messages from either the shared memory control module or remote nodes. Figure 1. The CC-NUMA based PC cluster system 2. The CC-NUMA prototype card As shown in Figure 2, the CC-NUMA prototype card is developed as a preliminary version of the CC-NUMA card presented in section 1. The prototype card concentrates on the shared memory control module that is implemented in a single FPGA (Xilinx Virtex XCV400-4HQ240). The shared memory and the network cache are stored in DRAM, while the tag information is stored in SRAM for speed. For the prototype system, the network control module is replaced by the Dolphin’s PCI-SCI card. Thus, the design of the prototype system is focused on the CC-NUMA control mechanism without worrying about the details of the SCI network. The cache coherency mechanism is based on the SCI’s typical set. SCI Network Node 1 Node n Network Control Module PCI Bus Local memory CPU L1 Cache

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance analysis on a CC-NUMA prototype

Cache-coherent nonuniform memory access (CC-NUMA) machines have been shown to be a promising paradigm for exploiting distributed execution. CC-NUMA systems can provide performance typically associated with parallel machines, without the high cost associated with parallel programming. This is because a single image of memory is provided on a CC-NUMA machine. Past research on CC-NUMA machines has...

متن کامل

Coarse Grain Task Parallelization of Earthquake Simulator GMS Using OSCAR Compiler on Various Cc-NUMA Servers

This paper proposes coarse grain task parallelization for a earthquake simulation program using Finite Difference Method to solve the wave equations in 3-D heterogeneous structure or the Ground Motion Simulator (GMS) on various cc-NUMA servers using IBM, Intel and Fujitsu multicore processors. The GMS has been developed by the National Research Institute for Earth Science and Disaster Preventio...

متن کامل

Data Distribution, Migration and Replication on a cc-NUMA Architecture

It is well known that, although cc-NUMA architectures allow construction of large scale shared memory systems, they are more difficult to program effectively because data locality is an important consideration. Support for specifying data distribution in OpenMP has been the subject of much debate [1], [4], and several proposed implementations. These take the form of data distribution directives...

متن کامل

Global Management of Coherent Shared Memory on an SCI Cluster

| The I/O-based implementations of the SCI standard allow cost-eecient use of shared memory on a wide range of cluster architectures. These implementations have typically been used for message-passing interfaces, but we are exploiting the use of I/O based SCI as a way to create NUMA architectures with commodity components. A major issue is that data placement and especially data consistency bec...

متن کامل

Volume Driven Data Distribution for NUMA-Machines

Highly scalable parallel computers, e.g. SCI-coupled workstation clusters, are NUMA architectures. Thus good static locality is essential for high performance and scalability of parallel programs on these machines. This paper describes novel techniques to optimize static locality at compilation time by application of data transformations and data distributions. The metric which guides the optim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000